Human Genomics — Latest Matching Preprints

1

A Foundational Exome Resource for Jordan: Dual Ancestry Admixture and Population-Specific Variants to Improve Clinical Variant Interpretation

Froukh, T.

2026-05-27 genetic and genomic medicine 10.64898/2026.05.23.26353895 medRxiv

Top 0.1%

6.4%

Show abstract

Currently, the genetic architecture of Middle Eastern populations is underrepresented in global genomic databases. This gap increases the rate of Variants of Uncertain Significance (VUSs) and clinical misinterpretations of genomic data especially in Middle Eastern populations. Whole exome sequencing was conducted on 90 healthy individuals from Jordan and the data were analysed using Principal Component Analysis (PCA) and multi-computational filtering. PCA revealed a double ancestry (EUR-AFR) admixture rather than a triple admixture (EUR-AFR-AMR). More than 3,500 populations-specific variants (PSVs) were identified, of which 72% were singletons. Additionally, 19 variants were significantly enriched compared to the maximum allele frequencies in public global databases (Fisher's exact test with Benjamini-Hochberg false discovery rate correction, p-value < 0.05). Consequently, the results suggest the reclassification of variants of Uncertain Significance (VUS) which reside in the ECE2 gene to likely benign and the variants of Conflicting Classification of Pathogenicity in the genes IL1RN and THPO to benign based on the significant allele frequency (AF=0.0389, p-value < 0.05). Furthermore, a pathogenic ClinVar variant was identified in a healthy individual, warranting careful interpretation. The findings underscore the importance of identifying PSVs in order to minimize or even prevent clinical misdiagnosis and highlight the unique genetic signature in Jordan. The study serves as a foundational resource for precision medicine in the region.

2

Evo 2 Predicts Cardiomyopathy-Associated Variants and Elucidates Their Underlying Mechanisms

kurozumi, a.; otsuka, n.; Masamichi, I.; kawakami, t.; Isagawa, T.; kodera, s.; takeda, n.

2026-05-17 genomics 10.64898/2026.05.15.725304 medRxiv

Top 0.1%

4.4%

Show abstract

BackgroundAlthough advances in next-generation sequencing have accelerated the identification of genetic variants in cardiomyopathy, interpreting variants of uncertain significance (VUS) remains a clinical challenge. Evo 2 is a high-resolution genomic artificial intelligence model capable of predicting pathogenicity across large sequence contexts and enabling mechanistic interpretation; however, its application in cardiovascular genetics is limited. Here, we evaluated the utility of Evo 2 for assessing the pathogenicity and underlying mechanisms of cardiomyopathy-associated variants. MethodsWe used Evo 2 to predict the pathogenicity of single-nucleotide variants in cardiomyopathy-related genes listed on ClinVar. We assessed the ability of the model to identify characteristic structural features in both coding and noncoding regions using internal representation such as embeddings, and to infer the molecular mechanisms of variants within these regions. ResultsEvo 2 demonstrated high predictive accuracy for pathogenicity, achieving an AUROC of 0.983 and an AUPRC of 0.915. Notably, sparse autoencoders (SAEs) from embeddings identified features corresponding to higher-order structural features, including coiled-coil and actin-binding domains characteristic of cardiomyopathy-related proteins, and accurately detected mutations known to disrupt these domains. The model recognized the binding motif of the cardiac-enriched transcription factor TBX5 with SAEs and accurately predicted a single-nucleotide polymorphism affecting TBX5 binding affinity after supervised fine-tuning. ConclusionsEvo 2 demonstrated strong performance for both predicting pathogenicity and extracting biological features of cardiomyopathy-associated variants. It may represent a powerful emerging tool for evaluating VUS in cardiovascular medicine.

3

Exome Sequencing Identifies POPDC2 as a Candidate Gene for Familial Congenital Junctional Ectopic Tachycardia

Helm, B. M.; Swan, A. H.; Rinne, S.; Pfuhl, M.; De Martino, E.; Kean, A. C.; Decher, N.; Brand, T.

2026-05-17 cardiovascular medicine 10.64898/2026.05.12.26353039 medRxiv

Top 0.1%

1.7%

Show abstract

Background: Congenital junctional ectopic tachycardia (cJET) is a rare, potentially life-threatening arrhythmia suspicious for a genetic basis, yet its molecular underpinnings remain incompletely defined. The POPDC2 gene, involved in cardiac pacemaking and membrane trafficking of interacting ion channels, has not previously been conclusively linked to human tachyarrhythmias. This study investigates a novel POPDC2 variant (p.Leu245Pro) identified in a family with autosomal dominant cJET. Methods: Exome sequencing was performed to identify co-segregating variants in the affected family. Functional analysis of the POPDC2 p.Leu245Pro variant was conducted by molecular dynamics (MD) simulations, a membrane targeting assay, and a bimolecular fluorescence complementation assay. Additionally, the impact of the variant on Nav1.5 and TREK-1 currents was characterized in Xenopus oocytes. Results: The p.Leu245Pro POPDC2 variant showed a destabilization of the POPDC1-POPDC2 dimer interface, resulting in impaired heterodimer formation and membrane localization. Electrophysiological studies in Xenopus oocytes demonstrated that the mutant protein significantly affected Nav1.5 and TREK-1 currents. These findings support a functional impact of the POPDC2 p.Leu245Pro variant relevant to cardiac conduction. Conclusions: Our results provide the first functional evidence implicating POPDC2 in cJET and support its role as a novel candidate gene in tachyarrhythmic disease. This study enhances the understanding of genetic contributions to cJET and suggests further investigation of POPDC2 in other forms of supraventricular tachyarrhythmias.

4

Investigating the Y chromosome in complex disease: Phenome-wide scan across 104,334 Finnish men

Preussner, A.; Leinonen, J. T.; FinnGen, ; Pirinen, M.; Tukiainen, T.

2026-06-10 genetic and genomic medicine 10.64898/2026.06.09.26355235 medRxiv

Top 0.1%

1.5%

Show abstract

Although the Y chromosome represents roughly 2% of the male genome, it is often ignored in genome-wide association studies (GWAS). Subsequently, the potential health impacts of Y-chromosomal genetic variation remain incompletely understood. To fill this gap, we performed a phenome-wide association study (PheWAS) in FinnGen across 1,426 binary and quantitative traits using Y-chromosomal variation (frequency [≥] 1%) in 104,334 genotyped men. As Y chromosome variation is prone to population stratification, we performed carefully adjusted association analyses and further examined these through kin-based validation in 19,275 female and 24,712 male 1st degree relatives. We found 121 suggestive (p < 5.6x10-3) phenotypic associations in the Y chromosome, yet none of these were strong enough to reach phenome-wide significance (p < 3.9x10-6). While only 38 associations were supported in the kin-based validation, intriguingly we found support for a previously suggested link between haplogroup I1 and coronary heart disease (CHD; OR=1.06, 95%CI=1.02-1.11, p=3.7x10-3; male validation OR=1.05; female validation OR=0.97). The I1-CHD association was detected across distinct geographical areas within Finland and was independent from Loss of Y (LOY) and the autosomal risk to CHD, proposing a link between germline Y-chromosomal variation and heart disease risk. Overall, this study presents a comprehensive phenome-wide analysis of Y-chromosomal associations, highlighting the potential relevance of Y-chromosomal variation beyond sex determination. Our findings further emphasize the need for improved capture of Y-chromosomal variants and further analyses in biobank-scale data to allow for deeper exploration of male-specific genetic architecture of complex diseases.

5

Benchmarking sequence performance on the DNBSEQ-T7 using Genome in a Bottle reference genomes

van Coller, A.; Taukobong, S.; Malima, M.; Ghoor, S.; Nangammbi, N.; Roode, E.; Naicker, M.; Cole, V.; Glanzmann, B.; Kinnear, C.; Carstens, N.

2026-05-26 bioinformatics 10.64898/2026.05.22.727100 medRxiv

Top 0.2%

1.3%

Show abstract

Advances in sequencing technologies have improved the accuracy, throughput, and completeness of human genome characterization, enabling more reliable detection of genetic variation. Well-characterized reference genomes are critical for benchmarking sequencing platforms and bioinformatics analysis pipelines. Here, we present whole genome sequencing datasets generated for the Ashkenazi Jewish trio reference samples from the Genome in a Bottle Consortium. Libraries were prepared using three distinct MGI-based workflows: PCR-free library preparation, FastFS DNA library preparation, and Universal DNA library preparation. Sequencing was performed on the MGI DNBSEQ-T7 platform, generating a minimum of 400 million paired-end reads per sample, corresponding to 30X mean genome coverage. Raw reads were processed using a standardized GATK bioinformatics workflow. Sequencing performance and variant detection accuracy were evaluated using the Genome in a Bottle high-confidence benchmark variant sets. All workflows demonstrated high sequencing quality and concordance with GIAB benchmark truth sets, with PCR-free libraries showing the strongest indel calling performance and lowest Mendelian violation rates across the Ashkenazi trio. This dataset provides a resource for benchmarking DNBSEQ-T7 sequencing and bioinformatics workflows, and for evaluating the impact of library preparation strategies on whole genome variant detection performance.

6

Evaluation of the Contribution of Natural Selection to Greater Cardiometabolic Disease Risk in South Asian Populations

Searby, D. J. C.; Hemani, G.; Chong, A.; Lawson, D. J.; Chaturvedi, N. J.; Davey Smith, G.

2026-05-22 genetic and genomic medicine 10.64898/2026.05.15.26353234 medRxiv

Top 0.2%

1.2%

Show abstract

A greater genetic susceptibility has been proposed as an explanation of the greater rates of cardiovascular and metabolic disease in South Asian relative to European populations. We first demonstrate that after accounting for technical artefacts the genetic effects for related traits are largely consistent between ancestral groups, which downplays the role of GxG or GxE interactions driving differential prevalence. If higher genetic susceptibility in South Asians is due to selective pressures acting through adiposity-related traits in the evolutionary past, signatures of selection should be evident at loci associated with cardiometabolic disease and other causally related traits (e.g. fat distribution). We tested for enrichment of several selection statistics (FST, XP-EHH and XP-nSL) at loci associated with a range of traits related to cardiometabolic disease, in comparison to a null distribution of linkage disequilibrium (LD) score and minor allele frequency (MAF) matched SNPs. Loci associated with a subset of these traits (Type 2 diabetes mellitus, trunk fat percentage, body fat percentage and trunk fat mass) exhibited enrichment for FST, consistent with a moderate adaptive explanation for their cross-population differentiation. In contrast, none of the studied traits were enriched for haplotype-based statistics, indicative that cross population genetic divergence is unlikely to have been driven by recent selective sweeps but has rather likely arisen from either ancient selection or recent polygenic selection acting on standing variation.

7

Liver-to-Atria Inflammatory Axis Driving Arrhythmia

Yuan, Y.; Wang, S.; Ding, J.; Jiang, J.; Zeng, Y.; Li, T.; Shinohara, A. K.; Lin, C.; Sun, C.; Hoogeveen, R. C.; Chelu, M. G.; Saadatagah, S.; Jung, S. Y.; Olivares-Villagomez, D.; Ballantyne, C. M.; Dong, B.; Li, N.

2026-05-20 systems biology 10.64898/2026.05.19.726408 medRxiv

Top 0.3%

0.8%

Show abstract

BackgroundMetabolic dysfunction-associated steatohepatitis (MASH) is emerging as a risk factor of cardiometabolic diseases, including the atrial fibrillation (AF) - the most common sustained arrhythmia. Given that the liver is a major source of inflammatory mediators, lipids, and hepatokines under metabolic stress, we hypothesized that hepatocyte-derived factors in MASH may accelerate atrial remodeling and arrhythmogenesis. MethodsAnalysis of the Atherosclerosis Risk in Communities (ARIC) visit 5 cohort was performed to determine the association between the FIB-4 index - a classic indicator of liver fibrosis, and AF risk, with multivariable adjustment for common comorbidities. A murine model of MASH was induced using the GAN (Gubra-Amylin NASH) diet. Programmed intracardiac stimulation and echocardiography were performed to assess AF susceptibility and cardiac function. Calcium imaging, histology, flow cytometry, plasma proteomics, and single-nucleus RNA sequencing (snRNA-seq) analyses were employed to elucidate the role of recruited inflammatory macrophages via hepatocyte-derived osteopontin (OPN) in MASH-induced atrial remodeling. ResultsAnalysis of the ARIC cohort confirmed a higher cumulative incidence of AF and an elevated adjusted hazard ratio (HR) in patients with intermediate and high FIB-4 indices compared to individuals with low FIB-4 scores. MASH mice exhibited increased susceptibility to pacing-induced AF, accompanied by enhanced proarrhythmic calcium release events, atrial enlargement, and fibrosis, independent of ventricular dysfunction. Proteomics and snRNA-seq revealed that the hepatocyte-secreted OPN under MASH conditions promoted the differentiation and recruitment of TGFBR1+ inflammatory macrophages to the atria, leading to gasdermin D (GSDMD) activation - an effector of inflammasome signaling and consequent proarrhythmic atrial remodeling. Activation of the monocyte-derived pro-inflammatory TGFBR1+ macrophages was dependent on the OPN receptor CD44. Furthermore, the MASH-induced atrial fibroinflammatory milieu and enhanced AF susceptibility were mitigated through several strategies, including hepatocyte-specific Spp1 (encoding OPN) deletion, neutralization of circulating OPN, ablation of CD44 or GSDMD. ConclusionsThese findings establish a pathogenic role of the hepatokine osteopontin in driving activation and recruitment of TGFBR1+ inflammatory macrophages into the atria, leading to proarrhythmic atrial remodeling under MASH. Osteopontin-targeted therapy or GSDMD inhibition prevents AF, indicating a novel therapeutic strategy for liver disease-related atrial arrhythmogenesis. Clinical PerspectiveO_ST_ABSWhat is new?C_ST_ABSO_LIIn the ARIC cohort, metabolic dysfunction-associated steatohepatitis (MASH) is associated with increased risk of atrial fibrillation (AF) after adjusting for common comorbidities. Elevated levels of circulating osteopontin (encoded by SPP1) predict an increased risk of AF in patients with MASH-induced liver fibrosis. C_LIO_LIMASH enhances hepatocyte secretion of osteopontin, leading to expansion of myeloid cells and recruitment of inflammatory macrophages into atria. This liver-to-atrial inflammatory circuit promotes the development of a substrate conducive to AF, which can be attenuated by hepatocyte-specific Spp1 deletion or neutralizing anti-anti-osteopontin antibody treatment to eliminate the mediator, or ablation of inflammasome effector gasdermin D to correct the atrial response. C_LI What are the clinical implications?O_LIOsteopontin may serve as a biomarker for AF in MASH cohorts. C_LIO_LIAnti-osteopontin therapy through neutralizing antibodies may serve as a novel therapeutic strategy for liver disease-related atrial arrhythmia. C_LI

8

Optical genome mapping identifies source-associated structural variant differences across early-passage human iPSCs

Namvar, L.; Sedov, K.; Yang, M. J.; Hermosillo, R.; Zafar, F.; Schuele, B.

2026-05-31 genomics 10.64898/2026.05.29.728843 medRxiv

Top 0.3%

0.8%

Show abstract

BackgroundInduced pluripotent stem cells (iPSCs) are an important model for studying human diseases in vitro. However, previous studies have shown that iPSC reprogramming and extended cell culture can introduce genomic structural variants (SVs). Technologies like karyotyping, CNV microarrays, and whole-genome sequencing have limitations in resolution, sensitivity, or the ability to detect large and complex structural variants compared to optical genome mapping (OGM). OGM is a genome-wide structural variant detection method that analyzes fluorescently labeled ultra-high-molecular-weight DNA molecules to identify copy-number and balanced rearrangements. At sufficient coverage, OGM can detect SVs at approximately [≥]2 kbp and identify mosaic events supported by molecule-level evidence, offering higher resolution than conventional karyotyping or SNP-array-based QC. Here, we compared iPSC clones derived from peripheral blood mononuclear cells (PBMCs) and fibroblasts (FBCs) to determine whether starting somatic cell source is associated with differences in structural variant burden and SV-type profiles after nuclear reprogramming into iPSCs. ResultsWe analyzed 73 low-passage iPSC clones generated from 25 parental lines using OGM. Compared with PBMC-iPSCs, FBC-iPSCs showed higher SV burden with the enrichment of duplications [≥]100 kbp, more frequent overlap with protein-coding genes, fragile sites, and recurrent chromosomal hotspot regions. In contrast, PBMC-iPSCs showed fewer SVs overall, and a higher proportion of clones without detectable clone-specific SVs. ConclusionsOGM provides a high-resolution approach for post-reprogramming genomic quality control by detecting clone-specific structural variants at approximately [≥]2 kbp, including events below the resolution of conventional cytogenetic and SNP-array-based assays. In these early passage iPSCs, SVs overlapped protein-coding genes, fragile sites, and recurrent culture-associated chromosomal regions, underscoring the need for clone-level genomic assessment before downstream applications. FBC-derived iPSCs showed a higher SV burden, including more frequent and larger duplications, whereas PBMC-derived iPSCs more often lacked detectable clone-specific SVs. These findings suggest that PBMC-iPSCs and FBC-iPSCs can differ in post-reprogramming SV profiles and support the use of OGM as a QC strategy during iPSC generation and selection.

9

Dynamic Shifts in the Oral Microbiota Following Cancer Surgery: A 172-Sample Longitudinal Study of Surgical Site Infection Risk

Serpa, M. S.; Defelicibus, A.; Bartelli, T. F.; Tojal da Silva, I.; Nunes, D. N.; Kowalski, L. P.; Dias-Neto, E.

2026-05-21 oncology 10.64898/2026.05.18.26353519 medRxiv

Top 0.3%

0.8%

Show abstract

Background: Surgical site infection (SSI) is the leading cause of perioperative morbidity following oral cancer surgery, yet the role of the oral microbiota in SSI pathogenesis remains poorly defined. This study prospectively investigated microbiota dynamics in relation to SSI occurrence in patients undergoing resection for oral squamous cell carcinoma (OSCC). Methods: A total of 172 oral swab samples were collected from 45 OSCC patients across four longitudinal time points: baseline (~29 days pre-surgery), immediately pre-surgery (hospital admission), early post-surgery (within 5 days), and late post-surgery (6 to15 days). Bacterial composition was profiled by 16S-rDNA V3-V4 sequencing (172 successfully sequenced samples), and bacterial/human DNA ratios were quantified by qRT-PCR (170 samples evaluated). SSI was assessed within 30 days post-surgery using adapted CDC criteria. Results: Fourteen of 45 patients (31.1%) developed SSI. Younger age was significantly associated with SSI occurrence (median age 53.2 years in SSI group vs. 67.4 years in non-SSI group; p=0.011), with each one-year decrease in age conferring a 7% increased risk. Notably, younger patients presented with larger and more advanced tumors (T3/T4: median age 57.2 vs. 72.9 years for T1/T2; p=0.033), leading to more extensive surgical procedures. Across all 172 samples, surgery induced a marked post-operative reduction in bacterial load and diversity. However, at the late post-surgery time point (collection IV), patients with SSI exhibited significantly higher alpha-diversity compared to non-infected patients (p<0.05 for Observed, Shannon, and Simpson indices). Beta-diversity also differed significantly between groups at this time point (weighted UniFrac, p=0.043). Prevotella and Porphyromonas dominated SSI patients at infection, together accounting for ~40% of reads versus 9.5% in non-infected patients. Among the 172 samples analyzed longitudinally, Aggregatibacter abundance at the early post-surgery time point (collection III) emerged as a significant predictor of subsequent SSI (OR per 1% increase: 1.10; p=0.012), with frequencies >0.044% conferring a 5.7-fold higher risk. Conclusions: Our longitudinal analysis demonstrate that while OSCC surgery profoundly disrupts the oral microbiota, non-SSI patients restore their preoperative profile within 12 days. In contrast, SSI is characterized by persistent dysbiosis dominated by Prevotella and Porphyromonas. Younger patients with advanced tumors are at particular risk. Early post-surgical Aggregatibacter abundance may serve as a novel risk indicator for SSI, potentially enabling timely preventive interventions in high-risk patients.

10

Benchmarking of local ancestry inference with different assays and parameters

Motegi, T.; Huang, F.; Campbell, J. D.

2026-05-21 genomics 10.64898/2026.05.18.726085 medRxiv

Top 0.3%

0.8%

Show abstract

Local ancestry inference (LAI) enables high-resolution characterization of chromosomal segments inherited from distinct ancestral populations, offering unique insights into genetic architecture in admixed cohorts. While LAI is commonly performed with high-coverage whole-genome sequencing (WGS), the ability of other genotyping assays or varying sequencing depths has not been thoroughly benchmarked. In this study, we systematically evaluated the accuracy of LAI across SNP microarrays, whole-exome sequencing (WES), and ultra low-pass WGS (ULP-WGS) using diverse validation samples and state-of-the-art imputation pipelines. We show that ULP-WGS, when paired with GLIMPSE2, achieves robust accuracy at 0.25x coverage with a minimum genome window size of 0.5 centimorgans, with mean accuracy minus one standard deviation exceeding 95%. For WES, using "on-target" reads alone yields suboptimal performance, particularly for European and South Asian ancestries with accuracy less than 79.1% and 70.6%, respectively. However, incorporating "off-target" reads in WES and utilizing GLIMPSE2 substantially improved accuracy [≥]95% with a minimum window size of 0.2 centimorgans. We further evaluated formalin-fixed, paraffin-embedded (FFPE) samples and found that LAI could be performed successfully using WES data with accuracies of [≥]95% at a minimum window size of 0.5 centimorgans. In contrast, SNP microarrays did not achieve substantial accuracies at any window size ([≤]95%). Together, these results demonstrate that LAI is achievable without conventional high-coverage WGS and establish optimal parameters for LAI across platforms.

11

Connecting Baseline Immune Exhaustion in Hot Tumors to Oral Cancer Recurrence and Nodal Metastasis

Shaikh, S.; Basu, S.; Hajihosseini, M.; Nandy, S. K.; Moorthy, M.; Arun, I.; Lali, B. S.; Arun, P.; Mukherjee, G.; Pyne, S.

2026-05-30 oncology 10.64898/2026.05.27.26354295 medRxiv

Top 0.4%

0.7%

Show abstract

Background: The use of immune checkpoint inhibitors (ICIs) in the treatment of cancer has rapidly expanded over the last decade. However, there are several knowledge gaps in understanding how tumor cells evade the immune system. There is paucity of data in HPV negative oral cancer, particularly of the gingivobuccal region. Understanding the mechanism of immune system evasion in this cancer is vital for improving patient outcomes. Methods: We characterized the baseline immune milieu of oral cancer using immunohistochemistry (IHC) on whole tumor sections from 124 cases. Tumors were classified as hot or cold and further stratified into high-risk and low-risk groups. High-risk patients included those with lymph node metastasis at diagnosis/recurrence or distant metastasis within 2 years of treatment completion. Patients without these features were categorized as low risk. Validation by RNA-Seq and Joint Enrichment Analysis of Oncogenic and Immunologic Pathways was carried out in a subset of 46 cases. Results: Hot high-risk tumors (by IHC) were distinguished by elevated PD-L1 expression and reduced NK-cell, PD1, and CTLA-4 expression. There was no difference in the expression levels of CD3+, CD8+, granzyme, or perforin compared to hot low-risk tumors, findings that align with the definition of hot tumors. RNA-Seq revealed a gene signature associated with exhausted T-cells in hot high-risk tumors. Gene and pathway analyses identified differential upregulation of isoform-specific TOX, TCF, CXCR, RUNX, IRF, BRD and BCL6 genes, implicating immune cell exhaustion and tumor aggressiveness. Significantly downregulated genes included PDCD1, HAVCR2, ZAP70, and STAT, indicative of a disabled immune microenvironment. These findings support that a state of immune exhaustion in HHR tumors is driven by progenitor exhausted T-cells and terminally exhausted T-cells; independent of PD1-TIM3. Conclusion: These findings suggest that combining TOX/TCF/BCL6 inhibitors with immune checkpoint inhibitors in the adjuvant setting might benefit patients with hot high-risk tumors. Given the results, testing for a targeted exhaustion-related gene panel at diagnosis is recommended for oral cancers to stratify tumors as high-risk or low-risk. Larger validation studies and clinical trials are now warranted.

12

The Genetic Landscape and Epidemiological Characteristics of Inherited Retinal Diseases in the Chinese Population

Zeng, B.; Cui, Z.; Zhou, S.; Dai, W.

2026-05-29 ophthalmology 10.64898/2026.05.27.26354224 medRxiv

Top 0.4%

0.7%

Show abstract

Background: Inherited Retinal Diseases (IRDs) are a group of genetically heterogeneous blinding conditions. Major global genomic reference databases are disproportionately enriched for individuals of European ancestry. This underrepresentation creates a significant bias that impedes the accuracy of genetic diagnosis in the Chinese population. This study aims to address this limitation by constructing a comprehensive genetic landscape of IRDs using large-scale deep-sequencing data from a large Chinese cohort. Methods: The study leveraged variant data primarily from 10,588 individuals in the China Metabolic Analytics Project (ChinaMAP) and cross-referenced findings against multiple national and international databases. We systematically curated variants within a targeted panel of 291 IRD-associated genes. Variant pathogenicity was assessed using a comprehensive pipeline integrating InterVar-automated classification based on 2015 American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) guidelines, ClinVar evidence (review status [≥] 1 star), and manual literature curation. We delineated the mutational spectrum, identified population-enriched pathogenic/likely pathogenic (P/LP) variants, and analyzed the distribution characteristics of IRD-associated highly-mutated genes. Furthermore, we calculated the carrier frequencies (CF) and genetic prevalence (GP) of autosomal recessive(AR)-IRD genes in the Chinese population. Results: The study revealed a highly concentrated genetic landscape for AR-IRDs in the Chinese population, with ABCA4 and USH2A emerging as the primary drivers of the genetic burden. This finding aligns with previous Chinese cohorts but contrasts with global databases, where genes such as the X-linked RPGR are more prevalent. In contrast, autosomal dominant (AD)-IRDs exhibited high locus heterogeneity, with pathogenic variants dispersed across numerous genes (e.g., COL2A1 and MFN2). We identified a series of P/LP variants that were either high-frequency or significantly enriched in the Chinese population, such as CNGB1 (p.P530R) and specific recurrent alleles in ABCA4 and CYP4V2. The estimated cumulative CF for AR-IRDs was 1 in 5.60, and the theoretical total GP was 1 in 2,624.67, based on the ChinaMAP data. Conclusion: By integrating the ChinaMAP dataset with diverse genomic resources, this study provides a genetic landscape of IRDs in the Chinese population. Our analysis shows a concentrated mutational spectrum in AR-IRDs, contrasting with the pronounced heterogeneity in AD-IRDs. These findings, including population-specific pathogenic variants and refined prevalence estimates, provide a resource for precision diagnostics, genetic counseling, expanded carrier screening (ECS), and public health policy development in China.

13

Automated histopathological measurements of the tumor micro-environment predict distant metastasis after stage I/II Melanoma: discovery and validation in the population-based Dutch Early-Stage Melanoma (D-ESMEL) study

Kerkour, T.; Hollestein, L.; Nigg, A.; Li, Y.; Damman, J.; Zhou, C.; Nijsten, T.; Mooyaart, A.

2026-06-03 dermatology 10.64898/2026.06.02.26354705 medRxiv

Top 0.4%

0.7%

Show abstract

Abstract: Background: More than half of metastatic melanomas arise from patients initially diagnosed with early-stage melanoma. Objective biomarkers are needed to better identify high-risk patients. Objective: To evaluate the prognostic value of multiple histopathological characteristics in predicting distant metastasis risk, in early-stage melanoma. Methods: Using data from discovery set (n=442) and a population-based validation cohort (n=306, sampled from 5,815 patients) of the Dutch Early-Stage Melanoma (D-ESMEL) study, we investigated 14 histopathological characteristics of melanoma and their tumor micro-environment (TME) in an unprecedented integration, by expert pathologist scoring and automated quantitative measurements derived from a validated automated segmentation. Results: Increased immune infiltrates (40% in cases vs. 50% in controls) were associated with lower risk of metastasis. Automated immune cell density was predictive in both the discovery set and the validation cohort, outperforming the manual pathological tumor infiltrating lymphocytes. The remaining histopathological features, including mitotic activity, did not retain independent value after controlling for current staging variables. Limitations: TME evaluation in standard Hematoxylin-Eosin slides. Conclusion: TME reaction is an important determinant of melanoma progression. The automated quantification of immune cell density appears to be a biomarker for distant metastasis risk. Further investigation into specific immune cell subtypes is required to facilitate clinical integration.

14

Transcriptomic Profiling and Regulatory Network Analysis of Ten Metabolic Transporters Across Five Diabetic Complications: A Multi-Dataset, Twelve-Phase GEO Bioinformatics Study

Adegboyega, B. B.; Ekanem, P. C.; Awolaja, O. O.; Osarietin, E.; Okorie, B.

2026-05-27 bioinformatics 10.64898/2026.05.23.727195 medRxiv

Top 0.4%

0.7%

Show abstract

ObjectiveDiabetic complications collectively represent one of the most urgent unresolved problems in medicine, yet the field continues to study them in near-complete isolation from one another. No unified framework has systematically characterised the shared and divergent molecular signatures of ten clinically critical metabolic transporters across all five major complications, cardiomyopathy (DCM), nephropathy (DN), retinopathy (DR), peripheral neuropathy (DPN), and atherosclerosis and vasculopathy (DAD), through an integrated, multi-method computational pipeline. This study was designed to address that gap directly. MethodsEleven GEO microarray datasets comprising 118 diabetic and 76 control samples were analysed through twelve sequential phases: differential expression analysis, pan-complication overlap, weighted gene co-expression network analysis (WGCNA), GO/KEGG functional enrichment with gene set enrichment analysis (GSEA), STRING protein-protein interaction (PPI) network construction, competing endogenous RNA (ceRNA) network mapping, transcription factor activity inference using a VIPER-style algorithm, immune cell infiltration estimation by single-sample GSEA, diagnostic biomarker modelling using LASSO logistic regression and Random Forest classification, CMap-style drug repurposing by connectivity scoring, and two-sample Mendelian randomisation (MR) employing four independent estimators (inverse-variance weighted [IVW], MR-Egger, weighted median, and weighted mode). ResultsCD36 was the only transporter to achieve significant dysregulation across three independently sourced tissue types (DN, DR, DPN; logFC range 0.88 to 2.18), whilst TLR4 exhibited the highest fold-change in the study (logFC = 3.88, DPN) and the greatest WGCNA module membership (kME = 0.976, DPN). SERCA2 was significantly downregulated in three complications (DCM, DN, and DR) at formal significance thresholds and trended negatively in the remaining two (DPN and DAD), constituting the most consistently suppressed transporter in the study. Its universal downregulation was explicable through four convergent mechanisms spanning transcriptional, oxidative, ceRNA-mediated, and transcription factor-level regulation, and was confirmed as causally relevant to diabetic cardiomyopathy by eQTL Mendelian randomisation (beta = -0.085, p = 0.005). miR-21-5p was identified as the dominant ceRNA regulatory bridge (betweenness centrality = 0.428; 6.7-fold above the second-ranked miRNA), with MALAT1 as the sole lncRNA hub active in all five complications. PPARgamma and TP53 repression emerged as the leading transcription factor-level explanations for the simultaneous metabolic and inflammatory dysregulation characteristic of the diabetic transcriptome. Immune deconvolution revealed DCM as immunologically quiescent, DN as comprehensively infiltrated (ten enriched cell types), and DPN as mast-cell-dominated, identifying a cellular mechanism for TLR4-driven neuroinflammation that has not previously been systematically characterised. GLUT4 achieved perfect diagnostic discrimination for DPN (AUC = 1.000, p < 0.001; LASSO coefficient = -2.143), whilst SGLT2 was the leading DAD diagnostic marker (AUC = 1.000, p = 0.002). Epalrestat was the sole pan-complication drug repurposing candidate (significant connectivity reversal in four of five complications). Mendelian randomisation confirmed causal effects of T2DM genetic liability on all five complications (all p < 0.0001, all four estimators concordant), and eQTL-MR identified TLR4 (beta = +0.073, p = 0.006) and CD36 (beta = +0.070, p = 0.008) as causal risk factors for DN, SERCA2 reduced expression as a causal driver of DCM (beta = -0.085, p = 0.005), and SGLT2 expression as a causal protector against DN (beta = -0.070, p = 0.013). ConclusionsThis twelve-phase investigation identifies a pan-complication CD36/TLR4 inflammatory dyad and a SERCA2 calcium-mitochondrial effector axis, both confirmed at seven independent analytical levels, including causal genomic inference. GLUT4 downregulation defines DPN at the diagnostic level with perfect accuracy and is explicable through a five-layer mechanistic chain from MODY transcription factor inactivation to ceRNA competitive pressure. Epalrestat warrants prospective evaluation beyond its established DPN indication. These findings collectively constitute the most comprehensive computational characterisation of metabolic transporter biology in diabetic complications to date. RESEARCH IN CONTEXTO_ST_ABSWhat is already known about this subject?C_ST_ABSThe five major diabetic complications (cardiomyopathy, nephropathy, retinopathy, peripheral neuropathy, and atherosclerosisare) individually well-characterised, and several key metabolic transporters, including SGLT2, CD36, TLR4, SERCA2, and GLUT4, have established roles in one or more of these conditions. Mendelian randomisation has confirmed that T2DM genetic liability causally increases the risk of each complication independently. However, no study has examined all ten major metabolic transporters across all five complications simultaneously, and the shared versus complication-specific regulatory architectures of these transporters remain entirely uncharacterised. What is the key question?Which metabolic transporters are consistently dysregulated across all five diabetic complications, which are complication-specific, and can their shared regulatory mechanisms, from RNA regulation through to causal genetic evidence be used to identify diagnostic biomarkers and actionable therapeutic targets that transcend individual complication boundaries? What are the key findings and their implications for the field?CD36 and TLR4 constitute a pan-complication inflammatory dyad confirmed at seven independent analytical levels, including Mendelian randomisation causal evidence (both p < 0.01 for diabetic nephropathy). SERCA2 is universally suppressed across all five complications and is a causal driver of diabetic cardiomyopathy by eQTL-MR (p = 0.005). GLUT4 is a perfect single-gene diagnostic for diabetic peripheral neuropathy (AUC = 1.000) and a causal renal protector. Mast cells are identified as the innate cellular effectors of TLR4-driven diabetic neuropathy. Epalrestat demonstrates pan-complication therapeutic potential beyond its licensed DPN indication. These findings provide a unified mechanistic framework and a translational roadmap grounded in causal genomic evidence, with implications for both complication-targeted and pan-complication therapeutic strategies.

15

Proximity labeling reveals unique and shared interactomes of unmodified and pyroglutamate amyloid beta in human hippocampus in Alzheimers disease

Alia, A. O.; Urquhart, K.; Carson, H.; Killinger, B. A.; Janson, C.; Romanova, L.

2026-05-17 neuroscience 10.64898/2026.05.13.724866 medRxiv

Top 0.5%

0.7%

Show abstract

Amyloid plaques are a hallmark neuropathological feature of Alzheimers disease (AD), composed of insoluble amyloid beta (A{beta}) peptide. A{beta} undergoes post-translational modifications that alter their biophysical properties, aggregation kinetics, and neurotoxicity, creating a heterogeneous pool of species that differentially affect AD pathogenesis. Pyroglutamate-modified A{beta} (pEA{beta}) is a particularly aggregation-prone and proteolytically resistant variant that preferentially accumulates within plaque cores, is implicated in early plaque seeding, and is a major target of emerging anti-amyloid immunotherapies. However, the molecular environment surrounding pEA{beta} versus unmodified A{beta} (pan-A{beta}) in the human hippocampus remains incompletely defined. Here, we used Biotinylation by Antibody Recognition (BAR), an in-situ proximity labeling approach, to map and compare the protein-protein interactions (proteomes) of pEA{beta} and pan-A{beta} in formalin-fixed postmortem human hippocampal tissue from pathologically confirmed AD cases and cognitively normal (CN) controls. Differential proteomic analysis identified 48 significantly enriched proteins in AD pEA{beta} captures, 28 in AD pan-A{beta} captures, and 15 in CN pan-A{beta} captures. Whereas no significant enrichment was detected in CN pEA{beta} captures, supporting pEA{beta} as a pathology-associated species. pEA{beta} in AD demonstrated the largest variant-specific signature with 31 unique proteins, pan-A{beta} showed 11 unique proteins in AD, and 14 unique proteins in CN, 16 proteins were shared between AD pEA{beta} and AD pan-A{beta}, with PCSK1N shared across AD pEA{beta}, and AD/CN pan-A{beta}. Pathway enrichment analysis revealed broader biological disruptions linked to pEA{beta}, including synaptogenesis signaling, clathrin-mediated endocytosis, mitochondrial division signaling, and neurotransmitter release. Shared pathways included SNARE signaling, glutamatergic receptor signaling, and netrin signaling. These findings demonstrate that pEA{beta} engages an expanded, variant-specific interactome in human AD hippocampus and designate intracellular trafficking, synaptic signaling, and mitochondrial pathways as network-level vulnerabilities relevant to pEA{beta} pathology in AD. Notably, comparison of CN versus AD pan-A{beta} further distinguished protein networks associated with physiological A{beta} engagement versus pathological pan-A{beta} deposition.

16

Structural distance at the tRNA synthetase active site interface predicts pathogenicity but is captured by AlphaMissense and EVE except among score-ambiguous variants

Liebeskind, K.; Francklyn, C.; Barrantes Reynolds, R.

2026-05-26 bioinformatics 10.64898/2026.05.22.727252 medRxiv

Top 0.5%

0.7%

Show abstract

Variants of uncertain significance have accumulated as genomic sequencing has become more widespread, which complicates rare disease diagnosis and requires substantial resources for re-evaluation. Aminoacyl-tRNA synthetases (ARSs) are a protein family with extensive variant data and well-characterized disease associations, making them an ideal system for investigating the relationship between variant location and pathogenicity. Using structural distance measurements to the ARS-tRNA binding interface combined with existing pathogenicity predictors, AlphaMissense and EVE, we investigated whether explicit structural binding information could improve missense variant pathogenicity prediction. Pathogenic variants were found to cluster significantly closer to the tRNA-binding interface than benign variants (p = 0.0003). Incorporating explicit distance information into a Bayesian mixture model did not substantially improve predictive performance over AlphaMissense and EVE alone, suggesting that these models already implicitly capture relevant structural binding context. However, a clinically important subset of interface variants classified as ambiguous by both existing models identifies a specific gap where explicit structural distance information may provide added discriminative value, but the limited number of clinically validated variants currently available constrains the ability to fully evaluate this potential. Incorporating additional biologically relevant features not captured by existing models, such as protein stability or conformational dynamics, as well as refining structural distance calculations, may further improve classification of this subset. These findings highlight both the power and the limitations of existing pathogenicity predictors and suggest that structurally informed approaches targeting the binding interface represent a promising direction for improving classification of these ambiguous variants that have great clinical significance. Author SummaryAdvances in clinical genetic sequencing have caused increasing identification of genetic variants whose impact on human health is unknown. These "variants of uncertain significance" present a major challenge because their role in causing disease cannot yet be confirmed or ruled out. This study focuses on a specific family of essential enzymes called aminoacyl-tRNA synthetases, which play a critical role in the process of proteins translation. Mutations in these enzymes have been linked to a range of diseases. This project aims to provide a novel method for determining pathogenicity of variants specifically in aminoacyl-tRNA synthetases. We propose that physical proximity of a variant to the functional binding site of the enzyme is influential in determining pathogenicity. We find that this spatial relationship is a meaningful indicator of a variants potential to disrupt normal function.

17

Liver biopsy confirms precise and efficient correction of SERPINA1 after in vivo Base Editing in a Patient with Alpha-1 Antitrypsin Deficiency

Krooss, S. A.; Yang, T.; Yuan, Q.; Drick, N.; Sgodda, M.; Held, J.; Behrendt, P.; Hartleben, B.; Koczulla, R.; Ma, X.; Liu, Y.; Wedemeyer, H.; Janciauskiene, S.; Di Donato, N.; Cantz, T.; Wang, E.; Wu, Y.; Hoeper, M.; Xia, Q.; Ott, M.

2026-06-09 genetic and genomic medicine 10.64898/2026.06.01.26354551 medRxiv

Top 0.6%

0.5%

Show abstract

Background: Alpha-1 antitrypsin deficiency (AATD) caused by the PI*ZZ mutation (Glu342Lys) results in hepatic accumulation of misfolded AAT-Z protein and reduced circulating AAT levels, leading to progressive liver disease and emphysema. Gene correction therapy represents a potentially curative approach by directly correcting the underlying genetic defect. We report the first case of successful hepatic gene correction with early histological and functional assessment. Methods/Case presentation: We report the case of a 66-year-old male patient with PI*ZZ AATD who underwent gene correction therapy within the YOLT-202 phase I/Ia clinical trial (clinical trial.gov ID NCT07193615). Ten weeks post treatment a liver biopsy was performed to re-evaluate pre-existing F2 liver fibrosis as measured by elastography before entering the study. Serum samples allowed functional assessment of the AAT-mediated elastase inhibition. Results: Liver biopsy did not show signs of hepatic inflammation and demonstrated 54% (Sanger) and 57% (Illumina) gene correction rate of the PI*ZZ variant on the DNA level with no bystander edits or off-target effects. Following a transient elevation of transaminases during the early post-treatment period, liver enzymes normalized. Monthly serum AAT measurements demonstrated biologically active and stable therapeutic levels throughout follow-up. Conclusions: This case demonstrates efficient and precise hepatic gene correction without concerning histological alterations and with substantial improvement of functional parameters, supporting the feasibility and safety of gene editing approaches for AATD.

18

Pre- and Post-synapses Contain Lecanemab-reactive Amyloid-β in Post-mortem Human Alzheimer's Disease Brain

Holt, K.; Chang, Y. Y.; Li, M.; Albertini, G.; Smith, C.; Tulloch, J.; De Strooper, B.; Hardingham, G. E.; Spires-Jones, T. L.

2026-05-18 neurology 10.64898/2026.05.08.26352549 medRxiv

Top 0.6%

0.5%

Show abstract

Recently, the amyloid-beta (A{beta}) targeting antibody lecanemab has demonstrated modest therapeutic efficacy in slowing cognitive decline in people with Alzheimer's disease (AD). Lecanemab clears amyloid plaques from the brain; however, plaque load does not correlate strongly with cognitive function. The strongest neuropathological correlate of cognitive decline in AD is synapse loss, which is exacerbated in the halo surrounding neuritic amyloid plaques where A{beta} accumulates in remaining synapses. Here, we hypothesised that, through clearing plaques and the associated halo of soluble A{beta} that can directly damage synapses, lecanemab could temper plaque-associated synapse loss. High-resolution imaging of temporal cortex tissue from people who died with AD (N=20) and age-matched controls (N=19) reveals lecanemab staining within individual pre and post-synaptic excitatory terminals in addition to plaque staining. The percentage of pre-synapses containing lecanemab-positive A{beta} was over 200% higher in AD and the percentage of post-synapses was over 150% higher in AD than control tissue, with highest levels of synaptic lecanemab staining observed near plaques. These data demonstrate that lecanemab antibody recognises A{beta} within synapses, warranting future work to determine whether lecanemab treatment slows cognitive decline, at least in part, through both clearing plaques and facilitating clearance or neutralisation of synaptic A{beta}.

19

A T2T-CHM13 recombination map and globally diverse haplotype reference panel improves phasing and imputation

Lalli, J. L.; Bortvin, A. N.; McCoy, R. C.; Werling, D. M.

2026-05-28 genomics 10.1101/2025.02.24.639687 medRxiv

Top 0.6%

0.5%

Show abstract

The T2T-CHM13 complete human reference genome contains [~]200 Mb of previously unresolved sequence, improving read mapping and variant calling compared to GRCh38. However, the benefits of using complete reference genomes for phasing and imputation are unclear. Here, we present a reference T2T-CHM13 recombination map and phased haplotype panel derived from 3,202 samples from the 1000 Genomes Project (1kGP). Using published long-read based assemblies as a reference-neutral ground truth, we compared our T2T-CHM13 1kGP panel to the previously released GRCh38 1kGP panel. We found that alignment to T2T-CHM13 resulted in 38% fewer assembly-discordant SNP genotypes and 16% fewer phasing switch errors. The largest gains in panel accuracy were observed on chromosome X and in the regions flanking loci prone to disease-causing CNVs. Moreover, downsampled genomes from the Simons Genome Diversity Project attained higher imputation accuracy when using the T2T-CHM13 versus GRCh38 1kGP panel. Our study demonstrates that use of the T2T-CHM13 phased haplotype panel improves statistical phasing and imputation for samples from diverse human populations.

20

Long-Term Daily Chlorhexidine Foot Cleansing Reduces Staphylococcal Burden on the Feet of People with Prior Diabetic Foot Complications

Bode, M.; Lydecker, A.; Robinson, G.; Roghmann, M.-C.; Kalan, L.

2026-05-19 dermatology 10.64898/2026.05.14.26352248 medRxiv

Top 0.7%

0.4%

Show abstract

Background: Microbiota dysbiosis of the skin has been implicated in ulcer formation. Individuals with diabetes remain at high risk for diabetic foot ulcers (DFUs) even after ulcer healing. Topical chlorhexidine gluconate (CHG) is a broad-spectrum antiseptic commonly used to reduce microbial burden. In a prior randomized clinical trial comparing daily CHG foot treatment with soap-and-water treatment, no statistically significant reduction in new DFUs was observed, prompting evaluation of whether CHG produced durable changes in the skin microbiota. Objective: To compare changes in foot skin microbiota (including bacterial bioburden, diversity, and community composition) associated with daily CHG versus soap-and-water use over one year in people with diabetes and prior foot complications. Methods: In a single-center, double-blind, placebo-controlled randomized trial, 87 participants were randomized to daily CHG wipes or soap-and-water wipes for 12 months. Foot swabs were collected at baseline, 3 and 12 months, and 4 weeks post-treatment. Bacterial bioburden was quantified. Microbiota composition was assessed using 16S rRNA and ITS amplicon sequencing. Key Results: CHG treatment significantly reduced bacterial bioburden, increased microbial diversity, and altered community composition, including sustained reductions in Staphylococcus abundance. Several microbiota changes persisted more than 4 weeks after treatment cessation. Soap-and-water treatment showed similar but smaller and largely nonsignificant trends. Conclusions: Daily CHG use durably modifies foot skin microbiota in high-risk individuals with diabetes. However, this alone may be insufficient to prevent new foot complications, highlighting the need for additional interventions. These findings have implications for long-term CHG use in populations at risk for staphylococcal infections.